We present an automatic method for annotating images of indoor scenes with the CAD models of the objects by relying on RGB-D scans. Through a visual evaluation by 3D experts, we show that our method retrieves annotations that are at least as accurate as manual annotations, and can thus be used as ground truth without the burden of manually annotating 3D data. We do this using an analysis-by-synthesis approach, which compares renderings of the CAD models with the captured scene. We introduce a 'cloning procedure' that identifies objects that have the same geometry, to annotate these objects with the same CAD models. This allows us to obtain complete annotations for the ScanNet dataset and the recent ARKitScenes dataset.
translated by 谷歌翻译
We propose a novel approach for deep learning-based Multi-View Stereo (MVS). For each pixel in the reference image, our method leverages a deep architecture to search for the corresponding point in the source image directly along the corresponding epipolar line. We denote our method DELS-MVS: Deep Epipolar Line Search Multi-View Stereo. Previous works in deep MVS select a range of interest within the depth space, discretize it, and sample the epipolar line according to the resulting depth values: this can result in an uneven scanning of the epipolar line, hence of the image space. Instead, our method works directly on the epipolar line: this guarantees an even scanning of the image space and avoids both the need to select a depth range of interest, which is often not known a priori and can vary dramatically from scene to scene, and the need for a suitable discretization of the depth space. In fact, our search is iterative, which avoids the building of a cost volume, costly both to store and to process. Finally, our method performs a robust geometry-aware fusion of the estimated depth maps, leveraging a confidence predicted alongside each depth. We test DELS-MVS on the ETH3D, Tanks and Temples and DTU benchmarks and achieve competitive results with respect to state-of-the-art approaches.
translated by 谷歌翻译
我们提出了一个准确且轻巧的卷积神经网络,用于立体声估计,并完成深度完成。我们将此方法命名为具有深度完成(FCDSN-DC)的完全横向变形可变形相似性网络。该方法通过改进特征提取器来扩展FC-DCNN,添加一个网络结构,用于训练高度准确的相似性功能和网络结构,以填充不一致的差异估计。整个方法由三个部分组成。第一部分由完全连接的密集连接层组成,该图层计算整流图像对的表达特征。我们网络的第二部分学习了这项学习的功能之间高度准确的相似性功能。它由密集连接的卷积层组成,最终具有可变形的卷积块,以进一步提高结果的准确性。在此步骤之后,创建了初始视差图,并执行左右一致性检查以删除不一致的点。然后,网络的最后一部分将此输入与相应的左RGB图像一起使用,以训练填充缺失测量值的网络。一致的深度估计围绕无效点收集,并将RGB点一起解析为浅CNN网络结构,以恢复缺失值。我们评估了有关挑战现实世界室内和室外场景的方法,尤其是米德尔伯里(Middlebury),基蒂(Kitti)和ETH3D会产生竞争成果。我们此外表明,该方法可以很好地推广,并且非常适合许多应用,而无需进一步培训。我们的完整框架的代码可在以下网址提供:https://github.com/thedodo/fcdsn-dc
translated by 谷歌翻译
在许多计算机视觉管道中,在图像之间建立一组稀疏的关键点相关性是一项基本任务。通常,这转化为一个计算昂贵的最近邻居搜索,必须将一个图像的每个键盘描述符与其他图像的所有描述符进行比较。为了降低匹配阶段的计算成本,我们提出了一个能够检测到每个图像处的互补关键集的深度提取网络。由于仅需要在不同图像上比较同一组中的描述符,因此匹配相计算复杂度随集合数量而降低。我们训练我们的网络以预测关键点并共同计算相应的描述符。特别是,为了学习互补的关键点集,我们引入了一种新颖的无监督损失,对不同集合之间的交叉点进行了惩罚。此外,我们提出了一种基于描述符的新型加权方案,旨在惩罚使用非歧视性描述符的关键点的检测。通过广泛的实验,我们表明,我们的功能提取网络仅在合成的扭曲图像和完全无监督的方式进行训练,以降低匹配的复杂性,在3D重建和重新定位任务上取得了竞争成果。
translated by 谷歌翻译
我们提出了一种适用于许多场景中的新方法,理解了适应Monte Carlo Tree Search(MCTS)算法的问题,该算法最初旨在学习玩高州复杂性的游戏。从生成的建议库中,我们的方法共同选择并优化了最小化目标项的建议。在我们的第一个从点云中进行平面图重建的应用程序中,我们的方法通过优化将深度网络预测的适应性组合到房间形状上的目标函数,选择并改进了以2D多边形为模型的房间建议。我们还引入了一种新型的可区分方法来渲染这些建议的多边形形状。我们对最近且具有挑战性的结构3D和Floor SP数据集的评估对最先进的表现有了显着改进,而没有对平面图配置施加硬性约束也没有假设。在我们的第二个应用程序中,我们扩展了从颜色图像重建一般3D房间布局并获得准确的房间布局的方法。我们还表明,可以轻松扩展我们的可区分渲染器,以渲染3D平面多边形和多边形嵌入。我们的方法在MatterPort3D-Layout数据集上显示了高性能,而无需在房间布局配置上引入硬性约束。
translated by 谷歌翻译
大多数最先进的实例分割方法产生二进制分割掩码,但是,地理和制图应用程序通常需要精确的向量多边形的提取物体而不是光栅化输出。本文介绍了Polyworld,一个神经网络,即直接从图像中提取构建顶点并正确连接它们以创建精确的多边形。该模型使用图形神经网络预测每对顶点之间的连接强度,并通过解决可差化的最佳运输问题来估计分配。此外,通过最小化组合分割和多边形角差损失来优化顶点位置。Polyworld显着优于建筑多边形的最先进,并且不仅达到了显着的定量结果,而且还产生了视觉上令人愉悦的建筑多边形。代码和培训的重量将很快在GitHub上提供。
translated by 谷歌翻译
我们提出了一种基于深度学习的基于深度学习的多视图立体声方法。我们的方法通过以二进制决策方式遍历每个像素处的可行深度值的连续空间来估计高分辨率和高度精确的深度映射。决策过程利用了深网络架构:这计算了一个像素的二进制掩模,该屏蔽建立每个像素实际深度是否在其当前迭代单独深度假设的前面或后面。此外,为了处理闭塞区域,在每次迭代时,使用由第二网络估计的像素重量融合不同源图像的结果。由于采用的二元决策策略,这允许有效探索深度空间,我们的方法可以在不交易分辨率和精度的情况下处理高分辨率图像。这使其与大多数基于学习的多视图立体声方法相比,深度空间的明确离散化需要处理大的成本卷。我们将我们的方法与DTU,坦克和寺庙的最先进的多视图立体声方法进行比较,以及具有挑战性的Eth3D基准测试,并表现出竞争力的结果。
translated by 谷歌翻译
在本文中,我们提出了一种新的深度神经网络架构,用于联合类禁止对象分割和使用平行板夹持器的机器人拾取任务的掌握检测。我们引入深度感知的坐标卷积(CoordConv),一种方法来提高基于点提案的对象实例分段精度,在复杂的场景中不添加任何其他网络参数或计算复杂性。深度感知CoordConv使用深度数据来提取有关对象位置的先前信息以实现高度准确的对象实例分段。这些产生的分割掩模与预测的掌握候选者组合,导致使用平行板夹具抓住的完整场景描述。我们评估掌握检测和实例分割对具有挑战性机器人拣选数据集的准确性,即SIL \'EANE和OCID_GRASP,并展示了在真实世界机器人采摘任务上的联合掌握检测和分割的益处。
translated by 谷歌翻译
In the era of noisy intermediate scale quantum devices, variational quantum circuits (VQCs) are currently one of the main strategies for building quantum machine learning models. These models are made up of a quantum part and a classical part. The quantum part is given by a parametrization $U$, which, in general, is obtained from the product of different quantum gates. By its turn, the classical part corresponds to an optimizer that updates the parameters of $U$ in order to minimize a cost function $C$. However, despite the many applications of VQCs, there are still questions to be answered, such as for example: What is the best sequence of gates to be used? How to optimize their parameters? Which cost function to use? How the architecture of the quantum chips influences the final results? In this article, we focus on answering the last question. We will show that, in general, the cost function will tend to a typical average value the closer the parameterization used is from a $2$-design. Therefore, the closer this parameterization is to a $2$-design, the less the result of the quantum neural network model will depend on its parametrization. As a consequence, we can use the own architecture of the quantum chips to defined the VQC parametrization, avoiding the use of additional swap gates and thus diminishing the VQC depth and the associated errors.
translated by 谷歌翻译
The evolution of wireless communications into 6G and beyond is expected to rely on new machine learning (ML)-based capabilities. These can enable proactive decisions and actions from wireless-network components to sustain quality-of-service (QoS) and user experience. Moreover, new use cases in the area of vehicular and industrial communications will emerge. Specifically in the area of vehicle communication, vehicle-to-everything (V2X) schemes will benefit strongly from such advances. With this in mind, we have conducted a detailed measurement campaign with the purpose of enabling a plethora of diverse ML-based studies. The resulting datasets offer GPS-located wireless measurements across diverse urban environments for both cellular (with two different operators) and sidelink radio access technologies, thus enabling a variety of different studies towards V2X. The datasets are labeled and sampled with a high time resolution. Furthermore, we make the data publicly available with all the necessary information to support the on-boarding of new researchers. We provide an initial analysis of the data showing some of the challenges that ML needs to overcome and the features that ML can leverage, as well as some hints at potential research studies.
translated by 谷歌翻译